library(cnefetools)
library(odbr)
library(geobr)
library(sf)
library(dplyr)
library(purrr)
library(mapview)Aggregating census tract variables to traffic zones in the São Paulo Metropolitan Region
Many spatial analyses require socioeconomic or demographic data at a geographic unit that does not match the one in which it was originally collected. This is a particularly common challenge in transportation planning, where the unit of analysis is the traffic zone, a spatial unit whose boundaries often do not coincide with those of census tracts.
Traffic zone delineation follows design rules aimed at keeping intrazonal trips (trips that both start and end within the same zone) below a threshold — typically around 15% of all trips produced by that zone. This constraint, among others, often results in zone boundaries that do not coincide with census tract boundaries. As a result, analysts who need census-derived variables (population, income, racial composition, etc.) at the traffic zone level must transfer them from tracts to zones.
The most straightforward transfer method is areal interpolation: the value assigned to a target polygon is proportional to the share of each source tract’s area it contains (i.e., if a zone covers 30% of a tract’s area, it receives 30% of that tract’s total). The implicit assumption is that the variable of interest is uniformly distributed across the tract. In practice, however, population is rarely uniform — it concentrates along streets, in residential blocks, and away from parks, water bodies, or industrial areas. Applying areal weighting in such settings introduces systematic errors that grow with the mismatch between source and target geometries.
Dasymetric interpolation addresses this by incorporating auxiliary data that captures the spatial distribution of the variable within each source unit. For population-related variables, the natural choice is the location of private households: rather than spreading a tract’s population uniformly across its area, we distribute it equally among its registered dwellings and then aggregate by target polygon. This is precisely the data provided by the Brazilian CNEFE (Cadastro Nacional de Endereços para Fins Estatísticos), and it is what the tracts_to_polygon() function in {cnefetools} uses under the hood. The two-stage procedure for a count variable (e.g., number of residents in private permanent households) is illustrated below.

For average variables (e.g., average household head income), the procedure differs slightly: rather than dividing a tract’s total by the number of dwellings, each dwelling is simply assigned the tract’s average value directly. When aggregating to the target polygon, it receives the average of all assigned dwellings within it.
In a previous vignette I demonstrated this workflow for the municipality of São Paulo. Here I extend it to the entire São Paulo Metropolitan Region (RMSP), which spans 39 municipalities. Because each traffic zone belongs entirely to one municipality, we call tracts_to_polygon() once per municipality with its corresponding zones and bind the results at the end.1
Setup
Traffic zones
The traffic zones for the RMSP come from the 2017 Origin-Destination survey conducted by Metrô São Paulo, available through the {odbr} package. We project to SIRGAS 2000 / UTM Zone 23S (EPSG 31983), the appropriate projected CRS for the São Paulo region.
zones <- read_map(
city = "Sao Paulo",
year = 2017,
harmonize = FALSE,
geometry = "zone"
) |>
st_transform(31983)RMSP municipalities
We use {geobr} to obtain the municipality codes that make up the RMSP and then retrieve their geometries via read_municipality().
rmsp_codes <- read_metro_area(year = 2018) |>
filter(name_metro == "RM São Paulo") |>
pull(code_muni)
rmsp_munis <- read_municipality(code_muni = "SP", year = 2018, simplified = FALSE) |>
filter(code_muni %in% rmsp_codes) |>
st_transform(31983)Aggregating census tract variables to traffic zones
For each municipality we filter the zones that fall within it, then call tracts_to_polygon() to download the CNEFE data, distribute each tract’s pop_ph count (number of residents in private permanent households) equally among the CNEFE dwelling addresses within the tracts, and then aggregate to those zones. (Note: the full loop took approximately 4 minutes to complete on a 14-core, 20-thread CPU — 13th Gen Intel Core i9-13900H, 2.60 GHz — with 32 GB of RAM running Windows 11)
results <- map(rmsp_codes, function(code) {
muni_poly <- rmsp_munis[rmsp_munis$code_muni == code, ]
muni_zones <- st_filter(zones, muni_poly)
tracts_to_polygon(
code_muni = code,
polygon = muni_zones,
vars = "pop_ph",
verbose = F
)
})
zones_final <- bind_rows(results)Map
The map below displays estimated population density (residents in private permanent households per hectare) by traffic zone across the RMSP. The contrast between the dense urban core of São Paulo and the more sparsely populated peripheral municipalities is immediately visible, as are the pockets of high density scattered throughout the metropolitan fringe. Click on any zone to inspect its values.
zones_final <- zones_final |>
mutate(
area_ha = as.numeric(st_area(zones_final)) / 10000, # Area in hectares (ha)
pop_density = round(pop_ph / area_ha,3)
)
mapview(zones_final, zcol = "pop_density", layer.name = "Density (pop/ha)")